Search CORE

3 research outputs found

Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail. Applications à la génération déterministe de nombres pseudo-aléatoires.

Author: Mor Stefano Drimon Kurz
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

We present two contributions to the field of parallel programming.The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm.Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity.The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation.As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS.Its main characteristic is the use of a sequential generator that can perform a ``jump-ahead'' directly from one number to another on an arbitrary distance within the pseudorandom sequence.Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a "very parallel" program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare.Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform.The theoretical overhead of Par-R compares favorably to DotMix's overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.Nous présentons deux contributions dans le domaine de la programmation parallèle.La première est théorique : nous introduisons l'analyse SIPS, une approche nouvelle pour dénombrer le nombre d'opérations de synchronisation durant l'exécution d'un algorithme parallèle ordonnancé par vol de travail.Basée sur le concept d'horloges logiques, elle nous permet,: d'une part de donner de nouvelles majorations de coût en moyenne; d'autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité.La seconde contribution est pragmatique: nous présentons une parallélisation générique d'algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l'exécution.Alternative à l'utilisation d'un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS.Sa caractéristique principale est d'exploiter un générateur séquentiel qui peut "sauter" directement d'un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire.Grâce à l'analyse SIPS, nous montrons qu'en moyenne, lors d'une exécution par vol de travail d'un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d'opérations), ces opérations de saut sont rares.Par-R est comparé au générateur pseudo-aléatoire DotMix, écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail.Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement.De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent

Thèses en Ligne

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Hal - Université Grenoble Alpes

Lume 5.8

Analysis of Synchronizations In Greedy-Scheduled Executions - Application to Efficient Generation of Pseudorandom Numbers in Parallel

Author: Mor Stefano Drimon Kurz
Publication venue
Publication date: 26/10/2015
Field of study

Nous présentons deux contributions dans le domaine de la programmation parallèle.La première est théorique : nous introduisons l'analyse SIPS, une approche nouvelle pour dénombrer le nombre d'opérations de synchronisation durant l'exécution d'un algorithme parallèle ordonnancé par vol de travail.Basée sur le concept d'horloges logiques, elle nous permet,: d'une part de donner de nouvelles majorations de coût en moyenne; d'autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité.La seconde contribution est pragmatique: nous présentons une parallélisation générique d'algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l'exécution.Alternative à l'utilisation d'un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS.Sa caractéristique principale est d'exploiter un générateur séquentiel qui peut "sauter" directement d'un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire.Grâce à l'analyse SIPS, nous montrons qu'en moyenne, lors d'une exécution par vol de travail d'un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d'opérations), ces opérations de saut sont rares.Par-R est comparé au générateur pseudo-aléatoire DotMix, écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail.Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement.De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent.We present two contributions to the field of parallel programming.The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm.Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity.The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation.As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS.Its main characteristic is the use of a sequential generator that can perform a ``jump-ahead'' directly from one number to another on an arbitrary distance within the pseudorandom sequence.Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a "very parallel" program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare.Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform.The theoretical overhead of Par-R compares favorably to DotMix's overhead, what is confirmed experimentally, while not requiring a fixed generator underneath

Theses.fr

Efficent on-line scheduling of recursive fork-join programs on MPI

Author: Mor Stefano Drimon Kurz
Publication venue
Publication date: 01/01/2010
Field of study

Esta Dissertação de Mestrado propõe dois novos algoritmos para tornar mais eficiente o escalonamento on-line de tarefas com dependências estritas em agregados de computadores que usam como middleware para troca de mensagens alguma implementação da MPI (até a versão 2.1). Esses algoritmos foram projetados tendo-se em vista programas construídos no modelo de programação fork/join, onde a operação de fork é usada sobre uma chamada recursiva da função. São eles: 1. O algoritmo RatMD, implementado através de uma biblioteca de primitivas do tipo map-reduce, que funciona para qualquer implementação MPI, com qualquer versão da norma. Utilizado para minimizar o tempo de execução de uma computação paralela; e 2. O algoritmo RtMPD, implementado através de um sistema distribuído sobre daemons gerenciadores de processos criados dinamicamente com a implementação MPICH2 (que implementa a MPI-2). Utilizado para permitir execuções de instâncias maiores de programas paralelos dinâmicos. Ambos se baseiam em roubo de tarefas, que é a estratégia de balanceamento de carga mais difundida na literatura. Para ambos os algoritmos apresenta-se modelagem téorica de custos. Resultados experimentais obtidos ficam dentro dos limites teóricos calculados. RatMD provê uma redução no tempo de execução de até 80% em relação ao algoritmo usual (baseado em round-robin), com manutenção do speedup próximo ao linear e complexidade espacial idêntica à popular implementação com round-robin. RtMPD mantém, no mínimo, o mesmo desempenho que a implementação canônica do escalonamento em MPICH2, dobrando-se o limite físico de processos executados simultaneamente por cada nó.This Master’s Dissertation proposes two new algorithms for improvement on on-line scheduling of dynamic-created tasks with strict dependencies on clusters of computers using MPI (up to version 2.1) as its middleware for message-passing communication. These algorithms were built targeting programs written on the fork-join model, where the fork operation is always called over an recursive function call. They are: 1. RatMD, implemented as a map-reduce library working for any MPI implementation, on whatever norm’s version. Used for performance gain; and 2. RtMPD, implemented as a distributed system over dynamic-generated processes manager daemons with MPICH2 implentation of MPI. Used for executing larger instances of dynamic parallel programs. Both algorithms are based on the (literature consolidated) work stealing technique and have formal guarantees on its execution time and load balancing. Experimental results are within theoretical bounds. RatMD shows an improvement on the performance up to 80% when paired with more usual algorithms (based on round-robin strategy). It also provides near-linear speedup and just about the same space-complexity on similar implementations. RtMPD keeps, at minimum, the very same performance of the canonical MPICH2 implementation, near doubling the physical limit of simultaneous program execution per cluster node

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Lume 5.8

RCAAP - Repositório Científico de Acesso Aberto de Portugal